首页> 外文OA文献 >Combinatorial Network Optimization with Unknown Variables: Multi-Armed Bandits with Linear Rewards
【2h】

Combinatorial Network Optimization with Unknown Variables: Multi-Armed Bandits with Linear Rewards

机译:具有未知变量的组合网络优化:多臂   带有线性奖励的强盗

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In the classic multi-armed bandits problem, the goal is to have a policy fordynamically operating arms that each yield stochastic rewards with unknownmeans. The key metric of interest is regret, defined as the gap between theexpected total reward accumulated by an omniscient player that knows the rewardmeans for each arm, and the expected total reward accumulated by the givenpolicy. The policies presented in prior work have storage, computation andregret all growing linearly with the number of arms, which is not scalable whenthe number of arms is large. We consider in this work a broad class ofmulti-armed bandits with dependent arms that yield rewards as a linearcombination of a set of unknown parameters. For this general framework, wepresent efficient policies that are shown to achieve regret that growslogarithmically with time, and polynomially in the number of unknown parameters(even though the number of dependent arms may grow exponentially). Furthermore,these policies only require storage that grows linearly in the number ofunknown parameters. We show that this generalization is broadly applicable anduseful for many interesting tasks in networks that can be formulated astractable combinatorial optimization problems with linear objective functions,such as maximum weight matching, shortest path, and minimum spanning treecomputations.
机译:在经典的多臂匪徒问题中,目标是制定一项动态操作武器的政策,每项政策都会产生带有未知手段的随机奖励。兴趣的关键指标是后悔,定义为无知的,知道每条手臂的奖励手段的无所不能的玩家积累的预期总奖励与给定策略积累的预期总奖励之间的差距。先前工作中介绍的策略的存储,计算和后悔都随着分支数量的增加而线性增长,而分支数量很大时就无法扩展。我们在这项工作中考虑了带有依赖臂的一类广泛的多臂匪,它们以一组未知参数的线性组合形式产生回报。对于此通用框架,我们提出了有效的策略,这些策略显示出可以实现遗憾,该遗憾会随着时间呈对数增长,并且未知参数的数量呈多项式增长(即使从属分支的数量可能呈指数增长)。此外,这些策略仅要求存储的未知参数数量呈线性增长。我们表明,这种概括适用于网络中的许多有趣任务,这些任务可以用线性目标函数(例如最大权重匹配,最短路径和最小生成树计算)表达为可组合的优化问题。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号